On ergodic least-squares estimators of the generalized diffusion coefficient for 

fractional Brownian motion 
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We analyse a class of estimators of the generalized diffusion coefficient for fractional Brownian 
motion Bt of known Hurst index H, based on weighted functionals of the single time square displace- 
ment. We show that for a certain choice of the weight function these functionals possess an ergodic 
property and thus provide the true, ensemble-averaged, generalized diffusion coefficient to any nec- 
essary precision from a single trajectory data, but at expense of a progressively higher experimental 
resolution. Convergence is fastest around H ~ 0.30, a value in the subdiffusive regime. 
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Single molecule spectroscopy techniques allow the 
tracking of single particles over a wide range of time 
scales [ij-yl]- In complex media such as living cells, a 
number of recent studies have reported evidence for sub- 
diffusive transport of particles like proteins Q, viruses 
chromosome monomers Q, mRNA 0] or lipid gran- 
ules [U . Subdiffusion is typically characterized by a sub- 
linear growth with time of the mean square displacement 
(MSD), E(B 2 ) = Kt" with v < 1, where B t is the par- 
ticle position at time t, E denotes the ensemble average 
and if is a generalized diffusivity. 

A growing body of single trajectory studies suggest 
that fractional Brownian motion (fBm), among the va- 
riety of stochastic processes that produce subdiffusion, 
may be a model particularly relevant to subcellular trans- 
port. FBm is a Gaussian continuous-time random pro- 
cess with stationary increments and is characterized by 
a so-called Hurst index H = v/2. If H < 1/2, trajecto- 
ries are subdiffusive with increments that are negatively 
and long range correlated Q. Such correlations were 
observed in subdiffusing mRNA molecules RNA- 
proteins or chromosomal loci 0] within E. coli cells. Sim- 
ilarly, fBm can be used to describe the dispersion of apo- 
ferritin proteins in crowded dextran solutions [llj and of 
lipid molecules in lipid bilayers (l2| . 

Whereas the determination of an anomalous exponent 
from data has been extensively studied, as it demon- 
strates deviation from standard Brownian motion (BM) , 
the problem of estimating the generalized diffusion con- 
stant K has received much less attention. It appears that 
K is much more sensitive than v to many biological fac- 
tors and its precise determination can potentially yield 
valuable information about the kinetics of transcription, 
translation and other physico-biological processes. The 
generalized diffusivity of RNA molecules in bacteria is 
greatly affected (either positively or negatively) by per- 



turbations, for instance treatment with antibiotic drugs, 
which have however a negligible effect on v Q ■ Likewise, 
the coefficient K of lipids in membranes is strongly re- 
duced by small cholesterol concentrations, whereas v re- 
mains unchanged . In the context of search problems, 
a particle following a subdiffusive fBm actually explores 
the 3d space more compactly than a BM and can have 
a higher probability of eventually encountering a nearby 
target [13j. The larger the value of K, the faster this 
local exploration. 

In this paper, generalizing our previous results for stan- 
dard BM [3], we present a method to estimate the en- 
semble averaged diffusivity K from the analysis of sin- 
gle fBm trajectories of a priori known anomalous expo- 
nent. Estimating diffusion constants from data is not 
an easy task when trajectories are few and ensemble av- 
erages cannot be performed. BM and fBm are ergodic 
processes and time averages tend to ensemble averages, 
but convergence can be slow For finite trajectories of 
finite resolution, variations by orders of magnitude have 
been observed for estimators of the normal diffusion coef- 
ficient obtained from single particles moving along DNA 
[Tjj ]. in the plasma membrane or in the cytoplasm of 
mammalian cells [171 ] . Large fluctuations are also mani- 
fest in subdiffusive cases [i[[l2|]. 

A broad dispersion in the measures of the diffusion co- 
efficient raises important questions about optimal fitting 
methodologies. A reliable estimator must possess an er- 
godic property, so that its most probable value should 
converge to the true ensemble average independently of 
the trajectory considered and its variance should vanish 
as the observation time increases. Recently, much effort 
has been invested in the analysis of this challenging prob- 
lem and several different estimators have been analyzed, 
based, e.g.j_on the sliding time-averaged square displace- 
, T||, mean length of a maximal excursion [20j j . 
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the maximum likelihood approximation [21142 
timal weighted least-squares functionals |14| . 

Our aim here is to determine an ergodic least-square 
estimator for the generalized diffusion coefficient when 
the underlying stochastic motion is given by a fBm. The 
estimators considered here are single time quantities, un- 
like others based on fits of two-time quantities such as the 
time averaged MSD. 

Let us consider a fractional Brownian motion B t in one 
dimension with Bq = and zero expectation value for all 
t £ [0,T], where T is the total observation time. The 
covariance function of the process is given by [§] : 



and op- differs from other estimates used in the literature which 



Cov (B t ,B a ) = E{(B t - E{B t }) (B s - E{B S })} 
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where D{= K/2) is the generalized diffusion coefficient 
and the Hurst exponent H G (0,1). The Hurst index 
describes the raggedness of the resulting motion, with 
a higher value leading to a smoother motion. Stan- 
dard Brownian motion is a particular case of the fBm 
corresponding to H = 1/2. As already mentioned, for 
H < 1/2 the increments of the process are negatively 
correlated so that the fBm is sub-diffusive. On the other 
hand, for H > 1/2 the increments of the process are posi- 
tively correlated and superdiffusive behavior is observed. 

We consider a single trajectory B t , that is, a particular 
realization of an fBm process with a known H, and write 
down the following weighted least-squares functional: 
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dtW{t) (B 2 - K f t 



2H\ 



(2) 



where W(t) is some weighting function to be determined 
afterwards and Kf is a trial parameter. We call Kf an 
estimate of the generalized diffusion coefficient from the 
single trajectory Bt, if it minimizes F. Calculating the 
partial derivative dF/dKf, setting it to zero and solving 
the resulting equation for u = Kf/K, we find the follow- 
ing least-squares estimator of the generalized diffusion 
coefficient K: 



K f _ 1 ft dtu>{t)B 2 t 



where we have introduced the notation 
w(t) =t 2H W(t). 



(3) 



(4) 



Note that the estimator u measures the ratio of the ob- 
served generalized diffusion coefficient for a single given 
trajectory relative to the ensemble-averaged value. More- 
over, E{u] = 1 holds for any arbitrary oj(t), making it 
possible to compare the effectiveness of different choices 
of uj(t). It is worthwhile remarking that u is given by 
a single time integration (a local functional) and thus 



involve two-time integrals (see e.g., 15]). 

Further on, from a straightforward calculation the vari- 
ance of the estimator u is, for arbitrary weight function 
w(f), 



VarH^^^^^^^ 00 ;^^, (5) 



(fidtt^u(t) 



where Cov(B 2 ,B 2 } is the covariance function of a 
squared fBm trajectory 

Cov (B 2 , B 2 ) = E{(B 2 E{B 2 }) {B 2 E{B 2 }) } .(6) 

This function can be calculated exactly using Eq. ([IJ to 
give 



Cov (B 2 ,B 2 ) = 2 Cov 2 (B t ,B s ) 
K 2 

= fL (t 2H + s 2 



(7) 



Inserting the latter expression into Eq. ([5]) and noticing 
that the kernel is a symmetric function of t and s, we 
have 



Var(u) 



,2H 



(t-s) 2 «) 



Following Ref.[Hj], we choose 

u(t) = (to + ty 



(8) 



(9) 



where to is a lag time and a a tunable exponent. In a dis- 
crete time description, to can be set equal to the interval 
between successive measurements 14|. We thus identify 
to as a resolution parameter in the present continuous de- 
scription. We also note that in [14| , it was proven that a 
power law weight function of the type in Eq. ([9]) was op- 
timal among all weight functions. Fixing to and scanning 
over different values of a, we seek the value for which the 
variance of u is smallest. Hopefully, for such value, the 
variance should vanishes in the limit of infinite resolution 
or infinite data size, i.e. when the parameter e = to/T 
tends to zero. To check the latter point, we consider first 
the limit of an infinitely long observation time, e = 0. 
For a < = 1 + 2-ff the integrals in Eq. ([8]) can be 
performed exactly yielding 



Var(u) 



1h - a 
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r(i-a)r( 7H ) 



^■Ih - I — a T(l + j H - a) 
r(l - a)T(2 lH - 1) - 2r( 7g )r( 7g - a) 
r(2 7ff - a) 



(10) 



where T(-) is the gamma-function. On the other hand, 
for a > jh = 1 + 2i? and e = 0, the result in Eq. ((SJ) can 
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FIG. 1. (color online) The variance in Eqs. (|10[) (for a < 
1 + 2H) and |TT} (for a > 1 + 2H) as a function of a, for 
different values of the Hurst parameter H. 



be conveniently represented as a single integral 
T{2 lH ) T{2a-2 lH ) T 2 (a) 



Var(w) 



r 2 (a - jh) r 2 (jh) 

(1 + (1 - x) 2H - x 2H f 2 Fi (a, 2 lH ,2a; x) , (11) 



where 2-F1 (•) is the confluent hypcrgcomctric function. 
The integral in Eq. (fTTj) can be also performed exactly 
by using the series representation of the confluent hyper- 
geometric function and then resumming the resulting se- 
ries. However, the expression obtained is rather lengthy 
as it contains several hypergeometric functions 3F2 (•). 
On the other hand, the result in the form of Eq. (TTTj) can 
be tackled by Mathematica; in addition the asymptotic 
behavior can be easily extracted from it, so that we pre- 
fer to work with the compact expression (|11|) rather than 
with an exact but cumbersome expression. 

In FigQ] we show the dependence of the variance of 
the estimator u on the exponent a, for different values of 
the Hurst index H . We notice that for any fixed H, the 
variance vanishes as a approaches a — 1 + 2H and is non- 
zero for any other value. This means that for a fractional 
Brownian motion with Hurst index H the estimators in 
Eq. ^ with power-law weight functions ui(t) = (to+t)~ a 
possess an ergodic property only when a = 1 + 2H. 

The last issue we discuss is that of the decay rate of 
the variance when e is small but finite in the ergodic case 
a = 1 + 2H. It is straightforward to show from Eq. (|8]) 
that in the limit e — > the variance is given to leading 
order by: 



Var(u) 



C{H) 



ln(l/ £ ) ' 

where C(H) is a constant defined by: 



(12) 



(13) 
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FIG. 2. Prefactor in Eq. (|12|) as a function of the Hurst index. 



which exists for any H 6 (0, 1). This result generalizes 
that of Ref. [3] for ordinary Brownian motion. We con- 
clude that the variance of the estimator vanishes logarith- 
mically with the total observation time. In other words, 
the diffusion constant estimated from one trajectory by 
this method tends toward the correct value logarithmi- 
cally slowly. The prefactor C(H), which is displayed in 
FiglU reaches a minimum at H* ~ 0.30. From Fig|3J we 
notice that, keeping the resolution e fixed, the variance 
of u will be small for processes with H £ [0.15,0.6], typ- 
ically. This interval encompasses almost all the anoma- 
lous exponent values reported in single particle studies. 
Conversely, the function C(H) diverges as H — ► or 
1. Therefore, we can expect that, even with the ergodic 
choice of a, the estimates of the diffusion constant should 
become highly inaccurate for nearly localized or nearly 
ballistic fBm processes. 

In conclusion, we have shown that the true, ensemble- 
average generalized diffusion coefficient if of a fractional 
Brownian motion of known Hurst index H can be ob- 
tained from single trajectory data using the weighted 
least-squares estimator in Eq. ([3]) with the weight func- 
tion uj(t) = l/(£o + t) 1+2H . Such an estimator possesses 
an ergodic property so that K can be evaluated with 
any necessary precision but at the expense of increasing 
the observation time T (or decreasing to)- A limitation of 
the present class of estimators, which are based on single- 
time functionals of B 2 , is admittedly their slow conver- 
gence toward the ensemble average. Two-time function- 
als, based on the time averaged MSD, for instance, ex- 
hibit faster convergence: for fBm with H < 3/4 the rela- 
tive variance of the time averaged MSD vanishes ast^/T 
(l5| . Nevertheless these other estimators might be more 
sensitive to measurement errors and may not be accurate 
when diffusion is no longer a pure process but a mixture 
of processes with different characteristic times. A quan- 
titative comparison between estimators beyond the ideal 
cases considered here is a necessary future step. 
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